Posters - Schedules

Posters Home

View Posters By Category

Wednesday, November 9, 2022 between 8:30 AM - 9:30 AM
Thursday, November 10, 2022 between 8:30 AM - 9:30 AM
Friday, November 11, 2022 between 8:30 AM - 9:30 AM
Virtual: A GWAS and functional regulatory databases approach to identifying key obesity genes
COSI: rsg
  • Mia Yang Ang, The University of Tokyo, Japan
  • Fumihiko Takeuchi, Research Institute, National Center for Global Health and Medicine, Japan
  • Norihiro Kato, Research Institute, National Center for Global Health and Medicine, Japan


Presentation Overview: Show

[Introduction] Obesity is a complex, multifactorial disease, defined by excessive accumulation and storage of fat in the body. It is often reported that obesity increases susceptibility to other chronic diseases, including heart disease, diabetes, hypertension, sleep apnea and certain types of cancer. While GWAS has successfully identified a large number of susceptibility loci for obesity, the underlying mechanisms remain to be clarified.

[Objective] To systematically apply and perform regulatory functional annotation of obesity-associated SNPs detectable through GWAS.

[Method] Summary statistics from publicly available GWAS data were used in this study. We used several regulatory functional databases with each developed for diverse purposes, namely RegulomeDB, CardioGxE, TWAS hub and Expression Atlas, to identify and prioritize eQTL-genes, gene-environment interaction and genes expressed during obesity progression. Protein-protein interaction (PPI) network analysis was then conducted using Search Tool for the Retrieval of Interacting Genes (STRING) to identify hub genes. Lastly, gene set enrichment analyses were conducted using Enrichr to discover enriched biological processes and pathways associated to obesity.

[Results] Overall, we collated a total of 171,726 obesity-associated SNPs, which were grouped into 1,051 unique loci. Our functional analyses identified a set of ~1,200 genes that are deemed important in the pathogenesis of obesity, including 750 eQTL genes (e.g., ADCY3 and SPI1), 23 genes with significant gene-environment interactions (e.g., PPARG and TCF7L2), 509 TWAS genes detectable in obesity-associated tissues. PPI network analysis revealed a total of 286 hub genes that should regulate energy intake and expenditure and be the principal driver of obesity progression. GO analyses uncovered that hub genes were mainly involved in various cell surface receptor signaling pathways (notably cytokine-mediated and noncanonical Wnt signaling pathways), which significantly correlated with adipometrics and insulin resistance. Meanwhile, pathway enrichment analyses further indicated that obesity genes might confer several disease risks, such as cancers, myocarditis, Type-1-diabetes, Leishmaniasis and Alzheimer due to body fat effects.

[Conclusions] Consequently, our computational analysis discovered potential functional regulatory mechanisms of obesity through the application of obesity GWAS and expression of obesity-related genes. Our data provide targets and clues for future functional analyses on the pathophysiology and pathogenesis of obesity.

Virtual: A robust computational framework to benchmark spatiotemporal trajectory association analysis in single-cell spatial transcriptomics
COSI: rsg
  • Juan Vargas, The University of Colorado, United States
  • Douglas Fritz, The University of Colorado, United States
  • Fan Zhang, The University of Colorado, United States


Presentation Overview: Show

Recent fast development of spatial transcriptomics (ST) technologies provides new ways to characterize gene expression patterns along with spatial information. Compared to non-spatial single-cell transcriptomics, ST data offers a unique opportunity to unravel both spatial and temporal information simultaneously, which is crucial to understand pathogenic cell lineage contributing to disease progressions. A few computational machine learning or deep learning algorithms have been developed to identify these spatiotemporal trajectories. However, it is crucial to use the appropriate statistical model to fit overdispersed ST data, which is usually neglected in spatiotemporal association analysis. We developed a computational approach to select the best model by benchmarking 7 statistical models for overdispersed ST count data, which provides a sensitive framework and evaluation metric on selecting the model that best fits and predicts the ST data. Additionally, we also benchmarked the performances of identifying spatially aggregated gene signatures that are significantly associated with the identified spatiotemporal trajectories. By applying our framework on ST datasets, we found that Negative Binomial (NB) and Zero-inflated NB outperform Poisson, Quassi Poisson, Zero-inflated Poisson, Hurdle model, and linear mixed effect modeling for genes of medium and high variations. All models work equivalently well for low-count genes. We further use rootograms as a metric to assess the goodness of model fit. Applying our framework to public ST datasets, we are able to reveal genes that are associated with pre-defined spatiotemporal trajectory and reflect tumor immune interaction using 10X Visum ST data, and genes that characterize the structures of mouse hippocampus using Slide-seqV2 ST data.

Virtual: Dissecting cellular and molecular alterations in pre-invasive to invasive lung cancer progression at single-cell resolution
COSI: rsg
  • Liron Yoffe, Weill Cornell, United States
  • Alain Borczuk, Weill Cornell, United States
  • Timothy McGraw, Weill Cornell, United States
  • Olivier Elemento, Weill Cornell, United States
  • Nasser Altorki, Weill Cornell, United States
  • Vivek Mittal, Weill Cornell, United States


Presentation Overview: Show

Non-small cell lung cancer (NSCLC) accounts for most cancer-related deaths worldwide, largely due to diagnosis at advanced metastatic stages. Increased implementation of low-dose computed tomography (CT)-guided screening has led to the identification of non-solid lesions or ground-glass opacities (GGOs) that are not visible on plain chest radiography and can harbor very early-stage tumors. Growth of the lesions and/or development of a solid component can indicate more aggressive behavior. Using the low-dose CT reduced lung cancer mortality by >20%, underscoring the need for early detection and intervention. However, the cellular and molecular determinants that contribute to the malignant progression of those early preinvasive lesions remain poorly understood. To assess intratumoral heterogeneity and associated dynamic changes in the immune contexture during early lung carcinogenesis, we performed single-cell RNA-seq and Hyperion imaging on surgically resected freshly harvested tissues from 21 patients (matched solid and non-solid components of lesions and adjacent non-involved lungs), representing a spectrum from preneoplasia to invasive disease. Analysis of >300K cells identified cellular, transcriptomic, and metabolic alterations as well as tumor-stroma cross-talk pathways. Specifically, preinvasive to invasive disease progression was associated with significant changes in the immune landscape determined by an increase in exhausted CD4/CD8 T-cells, suppressive T-regs, polymorphonuclear myeloid-derived suppressor cells, and M2-like macrophages; and a decrease in cytotoxic CD8 T-cells, NK cells, and inflammatory cDC2 cells, suggesting that activation of immunosuppressive/anti-inflammatory microenvironment and suppression of immune activation microenvironment may enable tumor progression.

Virtual: Distal cis-regulatory elements regulate tissue-specific water-deficit response
COSI: rsg
  • Maud Fagny, GQE-Le Moulon - Université Paris-Saclay, INRAE, CNRS, AgroParisTech, France
  • Johann Joets, GQE-Le Moulon - Université Paris-Saclay, INRAE, CNRS, AgroParisTech, France
  • Olivier Turc, LEPSE, Univ Montpellier, INRAE, Institut Agro, France
  • Claude Welcker, LEPSE, Univ Montpellier, INRAE, Institut Agro, France
  • Anthony Venon, GQE-Le Moulon - Université Paris-Saclay, INRAE, CNRS, AgroParisTech, France
  • Harry Belcram, GQE-Le Moulon - Université Paris-Saclay, INRAE, CNRS, AgroParisTech, France
  • Francois Tardieu, LEPSE, Univ Montpellier, INRAE, Institut Agro, France
  • Sylvie Coursol, IJPB, INRAE, AgroParisTech, Université Paris-Saclay, France
  • Clémentine Vitte, GQE-Le Moulon - Université Paris-Saclay, INRAE, CNRS, AgroParisTech, France


Presentation Overview: Show

Understanding the molecular bases of crops response to environment is crucial for adapting cultivated varieties to climate change. Crops response to environment is mainly driven by developmental modifications and determined by a complex interaction between genetic background and environment. However, these GxE interaction remains largely unknown. Distal cis-regulatory elements (dCRE), including enhancers and silencers, are key players in the spatio-temporal coordination of gene expression during development and in response to environment. They activate complex genome-wide regulatory networks. While annotating dCRE in a genome is now largely feasible, identifying the target genes of these elements is challenging, because they do not necessarily target the closest genes, and can target different genes in different cell types. For these reasons, the contribution of dCRE-articulated regulatory networks to crops response to the environment remains poorly understood.
Using maize, whose dCRE have been well characterized, and its response to water deficit as a model, we investigate the extent of the gene regulatory network rewiring in response to environment in several tissues. We first generated RNA-Seq data from seven tissues of the inbred line B73 grown in two watering conditions. Using the NetZoo software suite, we integrated these data with genomic and DNA methylation data to model the tissue- and condition-specific regulatory networks between transcription factors binding the dCRE and their potential target genes.
We first show that gene regulatory network inference methods are efficient tools to identify target genes of dCRE involved in development. Secondly, we find that the response of different tissues to water deficit varies in terms of both magnitude of regulatory network rewiring and categories of biological functions activated. Surprisingly, silks (female flowers) and cob do not present any changes. On the contrary, leaves present large-scale rewiring in response to water deficit, activating functions such as osmotic stress response, control of cell proliferation, carbohydrate biosynthesis and post-transcriptional regulation. Internodes (maize stem) biological processes regulation is also modified, in particular mitochondrial respiration activity and cell wall component synthesis, in line with what is known at the macroscopic level. We also identify in each case the main dCRE responsible for this rewiring and the transcription factors involved. To conclude, our results show that maize response to water deficit involves, at the molecular level, a profound rewiring of the gene regulatory networks articulated by dCRE. Mutation at these elements could thus play a crucial role in determining tolerance to water deficit in maize.

Virtual: Evaluating deep learning for predicting epigenetic profiles
COSI: rsg
  • Shushan Toneyan, Cold Spring Harbor Laboratory, United States
  • Ziqi Tang, Cold Spring Harbor Laboratory, United States
  • Peter Koo, Cold Spring Harbor Laboratory, United States


Presentation Overview: Show

Deep learning (DL) models have been successful at predicting epigenetic profiles from DNA sequences. Most approaches frame the prediction task as a binary classification, using peak calling to convert coverage values of the epigenetic profiles into presence or absence of activity. In contrast, more recent quantitative models directly predict the coverage values, avoiding peak calling altogether. Due to the models being trained on different task formats and employing different training configurations, a major bottleneck in the field is the lack of ability to fairly assess the novelty of proposed models and their utility for downstream biological discovery. Here, we propose a unified evaluation framework for DL models trained on regulatory genomics data and use it to systematically compare various quantitative and binary models at predicting chromatin accessibility from ATAC-seq data. We reveal that factors such as coverage resolution, dataset selection, data augmentations, and task format can significantly affect generalization performance, including a popular downstream application of predicting variant effects via in silico mutagenesis. In addition, we introduce a new method for evaluating model robustness, which provides new dimensions for model selection. Furthermore, to assess the utility of models for providing new biological insights we define a new benchmark dataset for estimating model performance on predicting functional genomic assay results. Our evaluation framework provides an avenue to comprehensively assess genomic DL models in a fair and systematic manner, thus enabling the identification of key innovations in architecture and training that propels the field forward.

Virtual: From gene regulatory network inference to multiscale modeling: understanding the crosstalk between immune and cancer cells in Chronic Lymphocytic Leukemia
COSI: rsg
  • Malvina Marku, INSERM, Cancer Research Center of Toulouse, France, France
  • Hugo Chenel, INSERM, Cancer Research Center of Toulouse, France, France
  • Julie Bordenave, Université Toulouse III-Paul Sabatier, France, France
  • Nina Verstraete, INSERM, Cancer Research Center of Toulouse, France, France
  • Leila Khajavi, University Cancer Institute of Toulouse Oncopole, France, France
  • Flavien Raynal, INSERM, Cancer Research Center of Toulouse, France, France
  • Vera Pancaldi, INSERM, Cancer Research Center of Toulouse, France, France


Presentation Overview: Show

The tumour micro-environment (TME) can be seen as a complex system containing multiple cell types interacting through contact and cytokine exchanges. In particular, immune cells play a major role in cancer development and their characterisation allows a better understanding of the TME. In this context, transcriptomics time courses allow studying the gene regulatory networks and interactions between myeloid immune and cancer cells to obtain relevant information about the biology behind them, and to identify novel molecular interactions and potential drug targets.
In this project, we aim to characterise the formation of Nurse Like Cells (NLC), a type of tumour associated macrophages found in lymph nodes of Chronic Lymphocytic Leukaemia (CLL) patients, and to investigate their cross-talk with cancer cells from a network perspective. To this end, building on in-vitro experiments of macrophage-CLL co-culture, we use multi-scale approaches to study the system at the molecular and cellular scale.
Firstly, we built a gene regulatory network of macrophage polarisation with a literature-based approach, identifying the main molecular regulators defining the macrophage phenotype in the presence of different extracellular stimuli, including CLL secreted cytokines. Additionally, we performed inference of regulatory networks in the CLL cells, using several inference methods on RNAseq time-series from CLLs isolated from a 13-days co-culture with macrophages. Furthermore, we applied TF activity analysis to reveal the processes taking place inside the CLL cells as they interact with the macrophages.
Secondly, to study the cell population spatio-temporal dynamics, we built an agent-based model of the co-culture, thus identifying important cellular processes as well as patient features determining the system’s longitudinal behaviour. We aim to integrate these two approaches in a multi-scale dynamical model, in which cell behaviour is determined both by cells’ interactions and by intra-cellular molecular regulation. With this multi-level approach we aim to recapitulate the processes that lead to the formation of tumour protective macrophages, specifically highlighting their effect on temporal dynamics of the CLL cell population. Finally, we hope to identify novel molecular targets to disrupt this interaction, which can strongly increase resistance to therapies in CLL and other cancers.

Virtual: Investigation of Universal Stress Proteins from Mycobacterium tuberculosis
COSI: rsg
  • Onyewuchi Njoku, University of Ibadan, Nigeria


Presentation Overview: Show

According to Wikipedia from the free encyclopedia, The Universal Stress Proteins domain is a superfamily of conserved genes which can be found in bacteria, archaea, fungi, protozoa and plants and they play a role in adaptation of bacteria to high temperature. They are usually expressed in Mycobacterium tuberculosis. It has been researched that they ease the adaptation of pathogens to the host environment, so making way for pathogenicity and promoting latency of tuberculosis. We have applied bioinformatics tools and developed analytics resources to categorize samples of universal stress protein (USP) sequences predicted from some genomes of Mycobacterium tuberculosis. We developed some Mycobacterium tuberculosis sequences of which about 75 percent of it have one USP domain while the rest have more USP domains. We developed analytics using Perl scripting language to investigate the involvement and participation of Universal Stress Proteins in the Mycobacterium tuberculosis in the host environment.

Virtual: Modelling the mask effectiveness in preventing SARS-CoV-2 transmission
COSI: rsg
  • Chacha Issarow, University of Cape Town, South Africa
  • Robin Wood, Desmond Tutu Health Foundation, South Africa
  • Nicola Mulder, University of Cape Town, South Africa
  • Linda-Gail Bekker, Desmond Tutu Health Foundation, South Africa


Presentation Overview: Show

Chacha Issarow1, Robin Wood1, Nicola Mulder2, Linda-Gail Bekker1

1The Desmond Tutu Health Foundation, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, South Africa
2Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CDRI Wellcome Trust Centre, University of Cape Town, South Africa


Abstract
Background: Masks, including N95 respirators, surgical masks, and cloth masks, protect the respiratory systems of the wearers from inhaling airborne infectious particles, such as SARS-CoV-2, suspended in the air.
Methods: We developed and simulated a flexible mathematical model under non-steady-state conditions to assess the effectiveness of the mask in reducing the spread of SARS-CoV-2.
Results: We found that wearing masks in settings with large space volumes, high ventilation rates, and a minimum duration of exposure could substantially reduce SARS-CoV-2 transmission. Our results show that for a mask with a high efficacy of 95%, the probability of acquiring SARS-CoV-2 infection was 0.450 at a ventilation of 1 ACH, 0.260 at 2 ACH, 0.095 at 6 ACH, and 0.046 at 12 ACH. For a mask with a low efficacy of 40%, the probability of being infected was 0.761 at a ventilation of 1 ACH, 0.509 at 2 ACH, 0.211 at 6 ACH, and 0.111 at 12 ACH. We found that for a mask with an efficacy of 95%, the probability of acquiring SARS-CoV-2 infection was 0.401 in a space volume of 20 m3 , 0.377 in 60 m3 , 0.322 in 150 m3 , and 0.293 in 210 m3 . For a mask with an efficacy of 40%, the probability of being infected was 0.685 in a space volume of 20 m3 , 0.609 in 60 m3 , 0.466 in 150 m3 and 0.398 in 210 m3 . Based on the duration of exposure, we found that for a mask with a high efficacy of 95%, the probability of acquiring SARS-CoV-2 infection was 0.409 in 100 min and 0.701 in 200 min of exposure. For a mask with a low efficacy of 40%, the probability of being infected was 0.601 in 100 min and 0.961 in 200 min of exposure. Our results indicate that masks, even those with low efficacy, reduce the likelihood of SARS-CoV-2 transmission.
Conclusion: Although they do not entirely protect wearers against airborne infection risks, all types of masks reduce the likelihood of SARS-CoV-2 transmission depending on the setting’s space volume, ventilation rate, and duration of exposure.

Virtual: PAS-GDC: Annotated ACMG gene-disease database and interactive search application for translational research in genomics and precision medicine
COSI: rsg
  • Raghunandan Wable, Rutgers, The State University of New Jersey., United States
  • Anirudh Pappu, Rutgers, The State University of New Jersey., United States
  • Achuth Nair, Rutgers, The State University of New Jersey., United States
  • Khushbu Patel, Rutgers, The State University of New Jersey., United States
  • Dinesh Mendhe, Rutgers, The State University of New Jersey., United States
  • Shreyas Bolla, Rutgers, The State University of New Jersey., United States
  • Sahil Mittal, Rutgers, The State University of New Jersey., United States
  • Habiba Abdelhalim, Rutgers, The State University of New Jersey., United States
  • Zeeshan Ahmed, Rutgers, The State University of New Jersey., United States


Presentation Overview: Show

To efficiently practice precision medicine in the clinical settings, it is important to integrate genomic profiles of the patients with their electronic health records (EHR). The disease definition in basic sciences and genomics is simplified. However, in the clinical world diseases are classified, identified, and adopted with their International Classification of Diseases (ICD) codes, maintained by the World Health Organization (WHO). This is the era of big data, where human-related biological databases continue to grow not only in count but also in volume, posing unprecedented challenges in data curation, storage, processing, analysis, and dissemination. Regardless of limitations, these databases are helpful in interpreting the disease taxonomy, etiology, and pathogenesis. However, there is still no such comprehensive database exists, which can link clinical codes (e.g., ICD) with genomic data (e.g., genes, variants, etc.). In this project, we are focused to support translational research in genomic and precision medicine with the development of an annotated gene-disease-code database accessible through a cross-platform, user friendly, and interactive search application i.e., PROMIS-APP-SUITE: Gene Disease Codes (PAS-GDC). However, our scope is limited to the list of genes approved by the American College of Medical Genetics and Genomics (ACMG). The ACMG is an organization that vests its interests in the medical genetics field and responsible for a guideline development that is internationally accepted for gene-variant interpretation. Our application support users in searching and integrating genes, diseases, and clinical codes. It allows users to export results in text formats to facilitate sharing and importing for translational research.

Virtual: PathwayGNN: an explainable graph neural network to predict drug response in cancer
COSI: rsg
  • Durdam Das, Fraunhofer ITEM, Division of Personalized Tumor Therapy, Regensburg, Germany, Germany
  • Karan Pathak, Albert-Ludwigs-University Freiburg, Faculty of Engineering, Department of Computer Science, Bioinformatics Group, Germany
  • Christoph A. Klein, Chair of Experimental Medicine and Therapy Research, University of Regensburg | Fraunhofer ITEM, Regensburg, Germany, Germany
  • Martin Hoffmann, Fraunhofer ITEM, Division of Personalized Tumor Therapy, Regensburg, Germany, Germany
  • Rolf Backofen, Albert-Ludwigs-University Freiburg, Faculty of Engineering, Department of Computer Science, Bioinformatics Group, Germany
  • Van Dinh Tran, Albert-Ludwigs-University Freiburg, Faculty of Engineering, Department of Computer Science, Bioinformatics Group, Germany


Presentation Overview: Show

Precision medicine in oncology focuses on individualized treatments by harnessing new knowledge and technology instead of the classical general therapy approach by interpreting variations in an individual’s genes, lifestyle and environmental factors to prevent, diagnose, or treat cancers. Being a data-driven approach, it entails solving computational tasks, one of which is drug response prediction, the subject of this study. We explore the application of graph neural network to leverage the complex pathway and drug structure information which is enriched with multiomics data and physicochemical properties respectively. Explainability of such AI models always acts as major roadblock to the real-world applications, which has also been addressed in this study. Building on recent advances from the field of deep learning, we present PathwayGNN, a novel method for drug response prediction integrating multiomics data from cancer cell lines, signaling pathways in cancer and features of drug structure. We compared our method with the GraphDRP model by Nguyen et al. 2021, a state-of-the-art method that uses a graph neural network to predict IC50 as drug response measurement. In the blind cell line setting, where the model is set to predict the drug response for unseen cancer cell lines, our model showed 13% improvement in pearson correlation and 19% reduction in root mean squared error compared to GraphDRP. In post-hoc model interpretation analysis, we found that for responding cell lines, in which anticancer drugs have high killing effect, as measured by Emax (the maximum effect achieved by a drug within a concentration range), our model pays more attention to the drug target genes. For non-responding cell lines, it was the other way around. These findings were confirmed with a gene set enrichment analysis for four drugs: Navitoclax (also known as ABT-263, a Bcl-2 inhibitor), Palbociclib (also known as Ibrance, an endocrine-based chemotherapeutic agent), Vinblastine (also known as Velban, a vinca alkaloid used to treat a wide variety of cancers), and Trametinib (also known as Mekinist, a kinase inhibitor used to treat melanoma, NSCLC and thyroid cancer). We confirmed that the top 20% most important features of our model are enriched in the drug target genes for the responding cell lines. This analysis provides concrete evidence that using prior biological knowledge of pathway structure and drugs as graphs helps to identify the key target genes for the prediction of the drug response.

Virtual: The Drug Response Library: another R-package for the evaluation of drug response curves
COSI: rsg
  • Martin Hoffmann, Personalized Tumor Therapy, Fraunhofer ITEM Regensburg, Germany
  • Durdam Das, Personalized Tumor Therapy, Fraunhofer ITEM Regensburg, Germany
  • Christoph Andreas Klein, Exper. Medicine and Therapy Research, University of Regensburg & Personalized Tumor Therapy, Fraunhofer ITEM Regensburg, Germany


Presentation Overview: Show

Evaluation of drug and small molecule screening data for their effects on cell viability and phenotypic characteristics is a main task in basic science and industrial high content screening. Software programs for this purpose have been developed over the last few decades and different standard packages are publicly available. These have been applied to large screening efforts, such as the GDSC and CCLE studies, with evaluated data being freely available on the corresponding databases. When working with the above and in-house data we recognized occasional problems with numerical convergence hampering fully automated processing. Moreover, we missed confidence information on characteristic drug response measures, like Emax, IC50 and AUC, response saturation criteria, as well as convenient summary graphics and tabular output. Here, we present our drug response library (drl) R-package that provides this functionality.

Common workflows start by data normalization followed by fitting of dose response curves, mostly using the four parameter log-logistic (Hill) function. Subsequently, characteristics of these curves are calculated - typically the maximum drug effect (Emax), the half inhibitory concentration (IC50) - often used synonymously to the half effect concentration (EC50) - or the area under the drug response curve (AUC). While picking only one of these values cannot in general fully specify the drug action, they supply valid information and are in wide use. To safeguard fully automated processing, the drl package ensures convergence by providing optimized starting values for the log-logistic function as well as three commonly applied hormesis models, automatic scaling of the required computational effort and iterating through different numerical algorithms. In addition, users can opt for calculating confidence intervals by three different standard methods and display results by dedicated custom graphics. We demonstrate the utility of the drl package by a case study on the GDSC2 data set. Here, we find that our strictly data-based definition of effect characteristics may substantially deviate from the evaluation provided by the GDSC2 project and that confidence intervals can be large in defined settings.

Virtual: Variant Effect Prediction Using Deep Neural Networks for Alzheimer's Disease
COSI: rsg
  • Alexander Y. Lan, Gladstone Institutes and University of California, San Francisco, United States
  • Soumya Kundu, Stanford University, United States
  • Lucas Kampman, Gladstone Institutes and University of California, San Francisco, United States
  • Anusri Pampari, Stanford University, United States
  • Anshul Kundaje, Stanford University, United States
  • M. Ryan Corces, Gladstone Institutes and University of California, San Francisco, United States


Presentation Overview: Show

Alzheimer’s disease (AD), affecting over 50 million individuals worldwide, is characterized by a progressive loss of cognitive function for which no effective therapies currently exist. To determine new genes and regulatory pathways involved in the disease, numerous genome-wide association studies (GWAS) have been performed to identify genetic variants, or single-nucleotide polymorphisms (SNPs), that are statistically associated with AD. However, over 90% of these variants reside within the noncoding genome, exerting their effects without modifying gene sequences. Instead, these noncoding variants affect how genes are expressed by altering the genetic sequences of enhancers and promoters which are activated by sequence-specific transcription factors (TFs). Unlike amino-acid altering variants within genes, the complex grammar of TF binding and gene regulation makes it challenging to predict which variants are truly functional and which genes and cell types they affect.

To pinpoint candidate causal variants in AD, we developed complex convolutional neural networks which learn the base-resolution grammar and syntax of the noncoding genome, enabling cell type-specific prediction of TF binding from the DNA sequence alone. Highly predictive models (AUC ~0.9, Pearson r ~0.8) were trained using single-cell chromatin accessibility data from seven distinct human brain cell types. Optimizations were made to expand the model’s predictive range and correct for the Tn5 bias associated with scATAC-seq data collection protocols.

AD GWAS-implicated noncoding variants were prioritized by their predicted functional effect, which was determined by perturbing the model’s input sequence with the SNP and assessing the predicted difference in chromatin accessibility. These rankings were validated computationally using DeepSHAP, a pseudo-back-propagation algorithm for model interpretation. Biological validation was conducted using massively parallel reporter assays (MPRAs) and corroborated model predictions of the top variants. Our models both verified previously known variants such as rs636317 and rs13025717 and nominated new candidate causal variants. To make these models broadly accessible, we also present the Variant Effect Prediction Platform, a tool that gives researchers a scalable and accurate computational method for evaluating and prioritizing their variants of interest in all genetic brain diseases.

Virtual: Variants in cis-regulatory elements of selected genes in laryngeal squamous cell carcinoma (LSCC)
COSI: rsg
  • Magdalena Kostrzewska-Poczekaj, Institute of Human Genetics, Polish Academy of Sciences, Poland
  • Michał Miller, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Poland
  • Małgorzata Jarmuż-Szymczak, Institute of Human Genetics, Polish Academy of Sciences, Poland
  • Katarzyna Kiwerska, Institute of Human Genetics, Polish Academy of Sciences, Poland
  • Reidar Grenman, Turku University Hospital and University of Turku, Finland, Finland
  • Małgorzata Wierzbicka, Institute of Human Genetics, Polish Academy of Sciences, and Poznan University of Medical Sciences, Poland
  • Małgorzata Rydzanicz, Department of Medical Genetics, Medical University of Warsaw, Poland
  • Rafał Płoski, Department of Medical Genetics, Medical University of Warsaw, Poland
  • Michał Dąbrowski, Nencki Institute of Experimental Biology, Polish Academy of Sciences, Poland
  • Maciej Giefing, Institute of Human Genetics, Polish Academy of Sciences, Poland


Presentation Overview: Show

In this study we used targeted next generation sequencing of proximal enhancers and promoters from the ENCODE and FANTOM consortia to identify regulome variants in LSCC and study their consequences on the expression of adjacent protein coding genes.
The regulome, 46 Mbp in total, was sequenced in 49 LSCC samples constituting of 7 LSCC cell lines and 42 tumors with paired blood samples using the Illumina HiSeq 1500 sequencer. We selected regions with variants found in at least two samples with (I) variant allele frequencies (VAF) > 20%; (II) being variants of conserved nucleotides (phyloP30way Genome Browser track; conservation >0.0). Downstream filtering for conservation and DNA pyrosequencing based validation left 12 variants corresponding to 5 genes. Within this group the PHC1 related chr12:008914173-C>G (hg38) variant creates a TF binding motif TGACGTCA recognized by several TFs, including CREB1 and JUN abundantly expressed in LSCC. We tested the effect of this variant in a reporter assay. Indeed, the variant resulted in up to 3 fold upregulation of PHC1 promoter activity in four tested LSCC cell lines. PHC1 encodes a component of Polycomb complexes involved in gene repression.
In summary, we demonstrate a transcription enhancing effect of chr12:008914173-C>G on PHC1 which may be a novel long tail driver in LSCC.

25: Regulus infers signed, context-dependent and process-based regulatory circuits between few cell types
COSI: rsg
  • Marine Louarn, INSERM - IRISA, France
  • Guillaume Collet, Univ Rennes, France
  • Ève Barré, Univ Rennes, France
  • Thierry Fest, University hospital of Rennes, France
  • Olivier Dameron, University of Rennes 1, France
  • Anne Siegel, IRISA -- CNRS, France
  • Fabrice Chatonnet, Univ Rennes, CHU Rennes, Inserm, EFS, France


Presentation Overview: Show

Motivation: Transcriptional regulation determines the activation or inhibition of gene expression. It is performed by transcription factors (TF) binding to DNA in context-dependent regulatory regions. Understanding transcriptional regulation is fundamental for identifying the main regulators (TF) of a dynamic system and being able to modify biological states. For example, it may allow to pinpoint the TF driving cancer cell transformation and use them as therapeutic targets. Current methods of transcriptional regulatory circuits inference, based on TF, regions and/or genes activity measurements require large numbers of samples for ranking the candidate TF-gene relations. Whether a regulation is an activation or an inhibition is also context-dependent and hard to predict. Few existing methods are applicable to common experimental or clinical settings, where the number of samples is limited. We hypothesize that context-specific transcriptional regulatory circuits can be inferred from few samples by (1) fully integrating information on TF binding, gene expression and regulatory regions activity, (2) reducing data complexity while keeping their dynamics by creating activity patterns and (3) using biology-based logical constraints to determine the global consistency of the candidate TF-gene relations and qualify them as activations or inhibitions.

Results: We introduce Regulus, a method which computes context specific TF-gene relations from gene expressions, regulatory region activities, genomic locations and TF binding sites data measured in few cell populations. After aggregating gene expressions and region activities into patterns allowing for a full comparison of the different populations, data are integrated into a RDF (Resource Description Framework) endpoint. A dedicated SPARQL (SPARQL Protocol and RDF Query Language) query retrieves all potential relations between expressed TF and genes involving active regulatory regions. These TF-region-gene relations are then filtered using a logical rule that ensures the consistency of the regulation in the different populations and allows to qualify it as activation or inhibition. Regulus compares favorably to the closest circuits inference method, provides signed relations consistent with public databases and, when applied to biological data, identifies both known and potential new regulators. Altogether, Regulus is devoted to transcriptional circuits inference in settings where samples are scarce and cell populations are closely related, in a specific biological context. Its innovative use of Semantic Web technologies (RDF and SPARQL) ensures that entities are correctly related, allows for linking entities to databases of the Linked Open Data initiative and opens for the easy addition of supplementary datasets.

Regulus is available at https://gitlab.com/teamDyliss/regulus